Automatic Segmentation of Speech at the Phonetic Level

نویسندگان

  • Jon Ander Gómez
  • María José Castro Bleda
چکیده

A complete automatic speech segmentation technique has been studied in order to eliminate the need for manually segmented sentences. The goal is to fix the phoneme boundaries using only the speech waveform and the phonetic sequence of the sentences. The phonetic boundaries are established using a Dynamic Time Warping algorithm that uses the a posteriori probabilities of each phonetic unit given the acoustic frame. These a posteriori probabilities are calculated by combining the probabilities of acoustic classes which are obtained from a clustering procedure on the feature space and the conditional probabilities of each acoustic class with respect to each phonetic unit. The usefulness of the approach presented here is that manually segmented data is not needed in order to train acoustic models. The results of the obtained segmentation are similar to those obtained using the HTK toolkit with the “flat-start” option activated. Finally, results using Artificial Neural Networks and manually segmented data are also reported for comparison purposes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Regional Pronunciation Variants for Automatic Segmentation

The goal of this paper is to create an extended rule corpus with approximately 2300 phonetic rules which model segmental variation of regional variants of German. The phonetic rules express at a broad-phonetic level phenomena of phonetic reduction in German that occurs within words and across word boundaries. In order to get an improvement in automatic segmentation of regional speech variants, ...

متن کامل

Improvements on Automatic Speech Segmentation at the Phonetic Level

In this paper, we present some recent improvements in our automatic speech segmentation system, which only needs the speech signal and the phonetic sequence of each sentence of a corpus to be trained. It estimates a GMM by using all the sentences of the training subcorpus, where each Gaussian distribution represents an acoustic class, which probability densities are combined with a set of condi...

متن کامل

Speech Recognition using Acoustic Landmarks and Binary Phonetic Feature Classifiers

In spite of decades of research, Automatic Speech Recognition (ASR) is far from reaching the goal of performance close to Human Speech Recognition (HSR). One of the reasons for unsatisfactory performance of the state-of-the-art ASR systems, that are based largely on Hidden Markov Models (HMMs), is the inferior acoustic modeling of low level or phonetic level linguistic information in the speech...

متن کامل

Evaluation of a segmentation system based on multi-level lattices

This paper addresses the problem of the evaluation of an automatic continuous speech segmentation system based on multi-level lattices called dendrograms previously described in [2] [3]. After a short overview of the segmentation framework, we briefly discuss the difficulties inherent to such an evaluation. We first state quantitative results based on the usual quality coefficient and oversegme...

متن کامل

A Comparison of Different Approaches to Automatic Speech Segmentation

We compare different methods for obtaining accurate speech segmentations starting from the corresponding orthography. The complete segmentation process can be decomposed into two basic steps. First, a phonetic transcription is automatically produced with the help of large vocabulary continuous speech recognition (LVCSR). Then, the phonetic information and the speech signal serve as input to a s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002